On Compiling Data Mining Tasks to PDDL
نویسندگان
چکیده
Data mining is a difficult task that relies on an exploratory and analytic process of large quantities of data in order to discover meaningful patterns and rules. It requires complex methodologies, and the increasing heterogeneity and complexity of available data requires some skills to build the data mining processes, or knowledge flows. The goal of this work is to describe data-mining processes in terms of Automated Planning, which will allow us to automatize the data-mining knowledge flow construction. The work is based on the use of standards both in data mining and automated-planning communities. We use PMML (Predictive Model Markup Language) to describe data mining tasks. From the PMML, a problem description in PDDL can be generated, so any current planning system can be used to generate a plan. This plan is, again, translated to a KFML format (Knowledge Flow file for the WEKA tool), so the plan or data-mining workflow can be executed in WEKA. In this manuscript we describe the languages, how the translation from PMML to PDDL, and from a plan to KFML are performed, and the complete architecture of our system.
منابع مشابه
Using automated planning for improving data mining processes
This paper presents a distributed architecture for automating data mining processes using standard languages. Data mining is a difficult task that relies on an exploratory and analytic process of processing large quantities of data in order to discover meaningful patterns. The increasing heterogeneity and complexity of available data requires some expert knowledge on how to combine the multiple...
متن کاملExploiting ontologies and higher order knowledge in relational data mining Doctoral Thesis
Present day knowledge discovery tasks require mining heterogeneous and structured data and knowledge sources. The key enabling factors for performing these tasks include efficient exploitation of knowledge about the domain of discovery and utilizing meta knowledge about the data mining process, which facilitates the construction of complex workflows consisting of highly specialized algorithms. ...
متن کاملCompiling and Executing PDDL in Picat
The declarative language Picat has recently entered the scene of constraint logic programming, in particular thanks to the efficiency of its planning library that exploits a clever implementation of tabling, inherithed in part from B-Prolog. Planning benchmarks, used in competitions, are defined in the language PDDL and this implied that Picat users were forced to reimplement those models withi...
متن کاملPerform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملAnalysis of Pre-processing and Post-processing Methods and Using Data Mining to Diagnose Heart Diseases
Today, a great deal of data is generated in the medical field. Acquiring useful knowledge from this raw data requires data processing and detection of meaningful patterns and this objective can be achieved through data mining. Using data mining to diagnose and prognose heart diseases has become one of the areas of interest for researchers in recent years. In this study, the literature on the ap...
متن کامل